Higher-Order Regret Bounds with Switching Costs

نویسنده

Eyal Gofer

چکیده

This work examines online linear optimization with full information and switching costs (SCs) and focuses on regret bounds that depend on properties of the loss sequences. The SCs considered are bounded functions of a pair of decisions, and regret is augmented with the total SC. We show under general conditions that for any normed SC, σ(x,x) = ‖x−x′‖, regret cannot be bounded given only a bound Q on the quadratic variation of losses. With an additional bound Λ on the total length of losses, we prove O( √ Q+ Λ) regret for Regularized Follow the Leader (RFTL). Furthermore, an O( √ Q) bound holds for RFTL given a cost ‖x − x′‖2. By generalizing the Shrinking Dartboard algorithm, we also show an expected regret bound for the best expert setting with any SC, given bounds on the total loss of the best expert and the quadratic variation of any expert. As SCs vanish, all our bounds depend purely on quadratic variation. We apply our results to pricing options in an arbitrage-free market with proportional transaction costs. In particular, we upper bound the price of “at the money” call options, assuming bounds on the quadratic variation of a stock price and the minimum of summed gains and summed losses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Learning with Switching Costs and Other Adaptive Adversaries

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player’s performance using a new notion of regret, also known as policy regret, which better captures the adversary’s adaptiveness to the player’s behavior. In a setting where losses are allowed to drift, we...

متن کامل

Multi-Armed Bandits with Metric Movement Costs

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution giv...

متن کامل

SpecWatch: A Framework for Adversarial Spectrum Monitoring with Unknown Statistics

In cognitive radio networks (CRNs), dynamic spectrum access has been proposed to improve the spectrum utilization, but it also generates spectrum misuse problems. One common solution to these problems is to deploy monitors to detect misbehaviors on certain channel. However, in multi-channel CRNs, it is very costly to deploy monitors on every channel. With a limited number of monitors, we have t...

متن کامل

Online learning with graph-structured feedback against adaptive adversaries

We derive upper and lower bounds for the policy regret of T -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of Õ(T ) and Õ(T ) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bound...

متن کامل

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

We consider an upper confidence bound algorithm for Markov decision processes (MDPs) with deterministic transitions. For this algorithm we derive upper bounds on the online regret (with respect to an (ε-)optimal policy) that are logarithmic in the number of steps taken. These bounds also match known asymptotic bounds for the general MDP setting. We also present corresponding lower bounds. As an...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Higher-Order Regret Bounds with Switching Costs

نویسنده

چکیده

منابع مشابه

Online Learning with Switching Costs and Other Adaptive Adversaries

Multi-Armed Bandits with Metric Movement Costs

SpecWatch: A Framework for Adversarial Spectrum Monitoring with Unknown Statistics

Online learning with graph-structured feedback against adaptive adversaries

Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

عنوان ژورنال:

اشتراک گذاری